Chapter 2: Collections

-- A Python Course for the Humanities by Folgert Karsdorp and Maarten van Gompel, with modifications by Mike Kestemont and Lars Wieneke


Lists

Consider the sentence below:


In [ ]:
sentence = "Python's name is derived from the television series Monty Python's Flying Circus."

Words are made up of characters, and so are strings in Python, like the string stored in the variable sentence in the block above. For the sentence above, it might seem more natural for humans to describe it as a series of words, rather than as a series of characters. Say we want to access the first word in our sentence. If we type in:


In [ ]:
first_word = sentence[0]
print(first_word)

Python only prints the first character of our sentence. (Think about this if you do not understand why.) We can transform our sentence into a list of words (represented by strings) using the split() function as follows:


In [ ]:
words = sentence.split()
print(words)

Make sure that you understand the syntax of this code! We call things like split() 'functions': functions provide small pieces of helpful, ready-made functionality that we can use in our own code. Here, we apply the split() function to the variable sentence and we assign the result of the function (we call this the 'return value' of the function) to the new variable words.

By default, the split() function in Python will split strings on the spaces between consecutive words and it will returns a list of words. However, we can pass an argument to split() that specifies explicitly the string we would like to split on. In the code block below, we will split a string on commas, instead of spaces. Do you get the syntax?


In [ ]:
fruitstring = "banana,pear,apple"
fruitlist = fruitstring.split(",")
print(fruitlist)

The reverse of the split() function can be accomplished with join(), it turns a list into a string, with a specific 'delimiter' or the string you want to use to join the items.


In [ ]:
fruitlist = ['banana', 'pear', 'apple']
delimiter = ","
fruitstring = delimiter.join(fruitlist)
print(fruitstring)

The above four lines can be accomplished in a single line if code, can you figure out how? (Tip: replace all variables by their values)


In [ ]:
# insert your oneliner here!

replace()

The replace() function is another function which can be called on a string. It will replace all occurrences of a specified substring with another string. Consider the lines in the code block below - and mind the order in which you pass the arguments to the function!


In [ ]:
text = "You can not compare apples and pears"
text = text.replace("pears", "apples")
text = text.replace("not ", "")
print(text)

DIY


In [ ]:
text = "Research has shown that it is often still possible to understand text even if all vowels are removed"
# insert your code here.. I suppose it's obvious what we want you to do ;-)

Python has two functions for changing the case of a string. lower() converts a string to lowercase characters and upper() returns an uppercased version:


In [ ]:
my_string = "AllCaps"
print(my_string)
my_string_upper = my_string.upper()
print(my_string_upper)
my_string_lower = my_string.lower()
print(my_string_lower)
my_string_capped = my_string.capitalize()
print(my_string_capped)

DIY

  • Can you come up with your own sentence my_sentence and split it into words along spaces? Print the new list of words.
  • You can recognize functions because they are always followed by (round brackets). Apart from the split() function, we already encountered other functions, also in the previous chapter. Which ones? Can you describe their functionality? Are there any differences in terms of syntax when you compare these to split()?
  • Is there a difference in length between the variables sentence and words? (Use functions to find this out!)

In [ ]:
# your DIY code goes here...

In many ways, list variables are very similar to strings. We can for example access its components using indexes and we can use slice indexes to access parts of the list. Let's try this out.

Write a small program that defines a variable first_word and assign to it the first word of our word list words from above. Do the same for the fifth word, the last word and the last but one word. Also, try to extract a slice from words and isolate the string of words between derived and Flying (the slice should not include derived and Flying). Also, make a slice of words that is identical to the title of the television series in words.


In [ ]:
# insert your code here

A list acts like some kind of container where we can store all kinds of information. We can access a list using indexes and slices. We can also add new items to a list. For that you use the function append(). Let's see how that works. Say we want to keep a list of all our good reads. We first declare an empty list using square brackets. Next, we add some good books to the list:


In [ ]:
# start with an empty list
good_reads = []
good_reads.append("The Hunger games")
print(good_reads)
good_reads.append("A Clockwork Orange")
print(good_reads)

Do you get the syntax that goes with the append() function? The list we wish to append the item to goes first and we join the append() function to this list using a dot (.). In between the round brackets that go with the function name, we place the actual string that we wish to add to the list. We call such a input value an 'argument' or a 'parameter' that we 'pass' to a function. Next, the function will return a 'return value'. Make sure that you are familiar with this terminology because you will often come across such terms when you look for help online!

Now, if for some reason we don't like a particular book anymore, we can change it as follows using the old item's index:


In [ ]:
good_reads[1] = "Pride and Prejudice"
print(good_reads)

As you see, it is no problem to reset or update an individual item in a list. This is different, however, for strings. Run the following code in which we try to change a single character in a string. This will raise an error: this is your computer signalling that something is wrong. This is because strings (and some other types) are immutable. That means that they cannot be changed using the index, as opposed to lists which are mutable.


In [ ]:
name = "Bonny"
print(name)
list_chars = list(name)
print(list_chars)
list_chars[2] = "X"
print(list_chars)
delimiter = ""
print(delimiter.join(list_chars))

DYI

Here's another small DIY! Add two new titles to the list of good_reads. Then, try to change the title of the second book in our good reads collection:


In [ ]:
# insert your code here

Lists are a really powerful way of dealing with your data in Python. Let's explore some other ways in which we can manipulate lists.

remove()

Let's assume our good read collection has grown a lot and we would like to remove some of the books from the list. Python provides the function remove() that you can call on a list and which takes as argument the item we would like to remove.


In [ ]:
good_reads = ["The Hunger games", "A Clockwork Orange", 
             "Pride and Prejudice", "Water for Elephants", "Illias", "Water for Elephants", "Water for Elephants"]
print(good_reads)
good_reads.remove("Water for Elephants")
print(good_reads)
good_reads.remove("Water for Elephants")
print(good_reads)

If we try to remove a book that is not in our collection, Python raises an error to signal that something is wrong.


In [ ]:
good_reads.remove("White Oleander")

Note, however, that remove() will only delete the first item in the list that is identical to the argument which you passed to the function. Execute the code in the block below and you will see that only the first instance of "Pride and Prejudice" gets deleted.


In [ ]:
good_reads = ["The Hunger games", "A Clockwork Orange", 
             "Pride and Prejudice", "Water for Elephants", "Pride and Prejudice"]
good_reads.remove("Pride and Prejudice")
print(good_reads)

Just as with strings, we can concatenate two lists using the + operator. Here is an example:


In [ ]:
# first we specify two lists of strings:
good_reads = ["The Hunger games", "A Clockwork Orange", 
              "Pride and Prejudice", "Water for Elephants",
              "The Shadow of the Wind", "Bel Canto"]
bad_reads = ["Fifty Shades of Grey", "Twilight", "The Hunger games"]

# then we combine them
all_reads = good_reads + bad_reads
print(all_reads)

good_reads += bad_reads
print(good_reads)

sort()

It is always nice to organise your bookshelf. We can sort our collection alphabetically with the following expression:


In [ ]:
sorted_reads = sorted(good_reads)
good_reads.sort()
print(sorted_reads)
print(good_reads)

nested lists

Until now our lists only consisted of strings. However, a list can contain all kinds of data types, such as integers and even lists! Do you understand what is happening in the following example? Have a close look at the square brackets used.


In [ ]:
nested_list = [[1, 2, 3, 4], [5, 6, 7, 8]]
print(nested_list[0])
print(nested_list[0][0])
print(nested_list[1][2])
print(nested_list[0][:-2])

We can put this to use to enhance our good read collection with a score for every book we have. An entry in our collection will consist of a score within the range of 1 and 10 and the title of our book. The first element is the title; the second the score [title, score]. We initialize an empty list:


In [ ]:
good_reads = []

And add two books to it:


In [ ]:
good_reads.append(["Pride and Prejudice", 8])
good_reads.append(["A Clockwork Orange", 9])
print(good_reads)



lists, tuples, sets

DIY

Update the good_reads collection with some of your own books and give them all a score and a publication year by nesting lists. Can you print out the score you gave to the first book in the list? And the publication year of the third item in your list? (Hint: you can pile up indexes using square brackets!)


In [ ]:
# insert your code here
print(good_reads[2])
What we have learnt

To finish this section, here is an overview of the new concepts you have learnt. Go through them and make sure you understand them all.

  • lists
  • nested lists
  • mutable versus immutable
  • .split() vs. .join()
  • .append()
  • .remove()
  • .sort()
  • .upper() vs. .lower()

Dictionaries

Our little good reads collection is starting to look quite impressive and we can perform all kinds of manipulations on it. Now, imagine that our list is large and we would like to look up the score we gave to a particular book. How are we going to find that book? For this purpose Python provides another more appropriate data structure, named dictionary. A dictionary is similar to the dictionaries you have at home. It consists of entries, or keys, that hold a value. Let's define one:


In [ ]:
my_dict = {"book": "physical objects consisting of a number of pages bound together",
           "sword": "a cutting or thrusting weapon that has a long metal blade",
           "pie": "dish baked in pastry-lined pan often with a pastry top"}
my_dict['word'] = "a cutting or thrusting weapon that has a long metal blade"
print(my_dict)

Take a close look at the new syntax. Notice the curly brackets and the colons. To look up the value of a given key, we `index' the dictionary using that key (again, between square brackets):


In [ ]:
description = my_dict["sword"]
print(description)

Like lists, dictionaries are mutable which means we can add and remove entries from it. Let's define an empty dictionary and add some books to it. The titles will be our keys and the scores their values. Watch the syntax to add a new entry:


In [ ]:
good_reads = {}
good_reads["Pride and Prejudice"] = 8
good_reads["A Clockwork Orange"] = 9
print(good_reads["Pride and Prejudice"])

In a way, this is similar to what we have seen before when we altered our book list. There we indexed the list using a integer to access a particular book. Here we directly use the title of the book. Note that the keys in a dictionary must be unique: why would that be?

DIY

Update the new good reads datastructure with your own books. Try to print out the score your gave for one of the books which you added.


In [ ]:
# put your code here

keys(), values()

To retrieve a list of all the books we have in our collection, we can ask the dictionary to return its keys as a list:


In [ ]:
vals = good_reads.values()
vals = list(vals)

Similarly we can ask for the values:


In [ ]:
print(my_books.values())
What we have learnt

To finish this section, here is an overview of the new concepts and functions you have learnt. Make sure you understand them all.

  • dictionary
  • indexing dictionaries and accessing values through their keys
  • adding items to a dictionary
  • .keys()
  • .values()

Final Exercises Chapter 2

Inspired by Think Python by Allen B. Downey (http://thinkpython.com), Introduction to Programming Using Python by Y. Liang (Pearson, 2013). Some exercises below have been taken from: http://www.ling.gu.se/~lager/python_exercises.html.

  • Ex. 1: Consider the following strings sentence1 = "Mike and Lars kick the bucket" and sentence2 = "Bonny and Clyde are really famous". Split these strings into words and create the following strings via list manipulation: sentence3 = "Mike and Lars are really famous" and sentence4="Bonny+and+Clyde+kick+the+bucket" (mind the plus signs!). Can you print the middle letter of the fourth sentence?

In [ ]:
# sentences
  • Ex. 2: Consider the lookup dictionary below. The following letters are still missing from it: 'k':'kilo', 'l':'lima', 'm':'mike'. Add them to lookup! Could you spell the word "marvellous" in code language now? Collect these codes into the list object msg. Next, join the items in this list together with a comma and print the spelled out version!

lookup = {'a':'alfa', 'b':'bravo', 'c':'charlie', 'd':'delta', 'e':'echo', 'f':'foxtrot', 'g':'golf', 'h':'hotel', 'i':'india', 'j':'juliett', 'n':'november', 'o':'oscar', 'p':'papa', 'q':'quebec', 'r':'romeo', 's':'sierra', 't':'tango', 'u':'uniform', 'v':'victor', 'w':'whiskey', 'x':'x-ray', 'y':'yankee', 'z':'zulu'}


In [ ]:
# lookup code
  • Ex. 3: Collect the code terms in the lookup dict (alpha, bravo, ...) from the previous exercise into a list called code_words. Is this list alphabetically sorted? No? Then make sure that this list is sorted alphabetically. Now remove the items victor, india and papa. Append the words pigeon and potato at the end of this list. Combine this new list of items into a single string, using a semicolon as a delimiter and print this string.

In [ ]:
# follow-up lookup code

You've reached the end of Chapter 2! Ignore the code block below -- it's only there to make the page prettier.


In [1]:
from IPython.core.display import HTML
def css_styling():
    styles = open("styles/custom.css", "r").read()
    return HTML(styles)
css_styling()


Out[1]: